Non-Linear Modeling for Dose- and Rate-Response Studies in Agriculture

An Introduction in R

Simon Riley, UF|IFAS Statistical Consulting Unit

UF|IFAS Statistical Consulting Unit

Open to everyone affiliated with IFAS

Provides free assistance with:

  • Experimental design, including sample size & power calculations

  • Data analysis, providing conceptual or applied guidance

  • Interpretation and reporting of results, including preparation of tables and figures

Submit a consulting request at: https://bit.ly/3Goicy9

Teams Page: useful articles, code snippits, place to ask general stats questions.

Seminar Series: Basic, software agnostic introductions to methods, concepts and best practices in statistics. First Friday (usually) of every month @ 3 PM in 426 McCarty C and on Zoom.

Preface:

  1. Model Trends Using Regression

    Researchers in agronomy frequently perform dose- or rate-response studies. Regression models, rather than ANOVA, are generally the most appropriate way to analyse these data.

  1. Best Model = Good Fit to Data + Useful to Researcher (+ Justified by Theory)

    Non-linear regression models can provide estimates of a parsimonious set of easily interpreted and biologically meaningful parameters, in a way that linear and polynomial regression sometimes cannot.

  2. Don’t Break Up Your Analysis, Build Out Your Model

    In general, it is better to analyse the data from all treatments/years/locations together, rather than separately. This slightly complicates model specification but (1) maximizes the precision of your estimates and, (2) greatly facilitates (and improves the statistical power of) subsequent testing.

Preface:

  1. Choose Versatile Software Tools

    It is better to become familiar with general-purpose non-linear modeling packages (e.g., nlme) than to rely on more user friendly but much less flexible, domain-specific R packages.

  1. Breathe

    Fitting non-linear models requires optimization algorithms which are not guaranteed to find a solution, and the flexibility of non-linear modeling increases the scope for model misspecification/user error. Non-linear modeling thus requires more patience than when working with linear models, especially at the begining.

Assumptions & Scope


Observed Value = Deterministic Trend + Random Variation

\[ Y = f(x) + \epsilon \]

\[ \epsilon \sim \mathrm{Normal}(0, \sigma^2) \]

  • Henceforth, we are only concerned with \(f(x)\)

  • Residuals assumed to be normally distributed, independent, with uniform variance

  • Tools do exist to relax these assumptions/extend this model, but are beyond our scope

Linear Regression Models

Simple Regression

Equation:

\[ f(x) = \beta_0 + \beta_1x \]

\(\beta_0\): intercept (i.e value of y when x = 0)

\(\beta_1\): slope (i.e. change in y per unit change in x)


# R Code: 
gnls(model = y ~ b0 + b1*x, 
     data = ag_exp, 
     params = b0 + b1 ~ 1,
     start = c(b0 = 1, b1 = 0.2))

# Or, equivalently
lm(formula = y ~ x, 
   data = ag_exp)

Simple Regression, Reparameterized

Equation:

\[ f(x) = \alpha_{0} + \frac{(x - \bar{x})}{\alpha_1} \]

\(\alpha_0\): intercept (i.e value of y when x = mean(x))

\(\frac{1}{\beta_1} = \alpha_1\): rate (i.e. change in x which causes 1 unit change in y)


# R Code: 
gnls(model = y ~ a0 + (x - mean(x))/a1, 
     data = ag_exp, 
     params = a0 + a1 ~ 1,
     start = c(a0 = 2, a1 = 5))

No Intercept Model, or “The One Parameter Line Function”

Equation:

\[ f(x) = \beta_1x \]

\(\beta_1\): slope (i.e. change in y per unit change in x)


# R Code: 
gnls(model = y ~ b1*x, 
     data = ag_exp, 
     params = b1 ~ 1,
     start = c(b1 = 0.2))

# Or, equivalently
lm(formula = y ~ 0 + x, # or y ~ x - 1
   data = ag_exp)

ANCOVA, Common Slope

Equation:

\[ f(x) = \beta_{0i} + \beta_{1}x \]

\(\beta_0\): intercept for the i-th group

\(\beta_{1i}\): slope for all groups


# R Code: 
gnls(model = y ~ b0 + b1*x, 
     data = ag_exp, 
     params = list(b0 ~ grp, b1 ~ 1),
     start = c(b0 = c(1, 1), b1 = 0.2))

# Or, equivalently
lm(model = y ~ grp + x, data = ag_exp)

Interlude - Design Matrices & Contrast Specifications

  • Design matrices (X) encode factor effects (\(\mathbf{\beta_1}\)) in columns of dummy variables

  • There are several, somewhat arbitrary, ways to construct them depending on the contrast coding being used

  • The choice of contrast coding thus determines interpretation of coefficients and reasonable starting values (essentially, reparameterizations of the linear model)

Matrix algebra notation for a linear model:

\(Y = \mathbf{X}\beta + \epsilon\)


For example, with 2 reps each of 2 treatments
\(\begin{bmatrix} Y_{1,1} \\ Y_{1, 2} \\ Y_{1, 1} \\ Y_{2, 2} \end{bmatrix}\) \(=\) \(\begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 0 \\ 1 & 0 \\ \end{bmatrix}\) \(\times\) \(\begin{bmatrix} \ \beta_0 & \beta_1 \end{bmatrix}'\) \(+\) \(\begin{bmatrix} \epsilon_{1,1} \\ \epsilon_{1,1} \\ \epsilon_{2, 1} \\ \epsilon_{2, 2} \end{bmatrix}\)

Interlude - Design Matrices & Contrast Specifications

“Treatment Contrasts” (R default)

  • Intercept: value for the first factor level (a control, perhaps?)

  • Other coefficients: difference between the intercept and that factor level’s value

# Set contrast specification
options(contrasts = c('contr.treatment', 'contr.poly'))

# print design matrix
model.matrix(~ grp, data = ag_exp)


   
grp (Intercept) trtB
  A           1    0
  A           1    0
  A           1    0
  B           1    1
  B           1    1
  B           1    1

ANCOVA Example:

# R Code: 
gnls(model = y ~ b0 + b1*x, 
     data = ag_exp, 
     params = list(b0 ~ grp, b1 ~ 1),
     start = c(b0 = c(1, 1), b1 = 2))

Interlude - Design Matrices & Contrast Specifications

“SAS Contrasts”

  • Intercept: value for the last factor level (a control, perhaps?)

  • Other coefficients: difference between the intercept and that factor level’s value

# Set contrast specification
options(contrasts = c('contr.SAS', 'contr.poly'))

# print design matrix
model.matrix(~ grp, data = ag_exp)


   
grp (Intercept) trtA
  A           1    1
  A           1    1
  A           1    1
  B           1    0
  B           1    0
  B           1    0

ANCOVA Example:

# R Code: 
gnls(model = y ~ b0 + b1*x, 
     data = ag_exp, 
     params = list(b0 ~ grp, b1 ~ 1),
     start = c(b0 = c(2, -1), b1 = 2))

Interlude - Design Matrices & Contrast Specifications

“Sum-to-Zero Contrasts”

  • Intercept: average value across all factor levels

  • Other coefficients: difference between the intercept and that factor level’s value

# Set contrast specification
options(contrasts = c('contr.sum', 'contr.poly'))

# print design matrix
model.matrix(~ grp, data = ag_exp)


   
grp (Intercept) trt1
  A           1    1
  A           1    1
  A           1    1
  B           1   -1
  B           1   -1
  B           1   -1

ANCOVA Example:

# R Code: 
gnls(model = y ~ b0 + b1*x, 
     data = ag_exp, 
     params = list(b0 ~ grp, b1 ~ 1),
     start = c(b0 = c(2.5, -0.5), b1 = 2))

Interlude - Design Matrices & Contrast Specifications

“No Intercept”

  • Each coefficient is the value for that group
# This works for any contrast specification, but becomes slightly
# complicated in the presence of interactions

# print design matrix
model.matrix(~ 0 + grp, data = ag_exp)


   
grp trtA trtB
  A    1    0
  A    1    0
  A    1    0
  B    0    1
  B    0    1
  B    0    1

ANCOVA Example:

# R Code: 
gnls(model = y ~ b0 + b1*x, 
     data = ag_exp, 
     params = list(b0 ~ 0 + grp, b1 ~ 1),
     start = c(b0 = c(1, 2), b1 = 2))

ANCOVA, Separate Intercepts & Slopes

Equation:

\[ f(x) = \beta_{0i} + \beta_{1i}x \]

\(\beta_0\): intercept for the i-th group

\(\beta_{1i}\): slope for the i-th group


# R Code: 
gnls(model = y ~ b0 + b1*x, 
     data = ag_exp, 
     params = b0 + b1 ~ 0 + group,
     start = c(b0 = c(1.5, 1.0), b1 = c(0.125, 0.2)))

# Or, equivalently
lm(model = y ~ 0 + grp + grp:x, data = ag_exp) 

Polynomial Regression

Equation:

\[ f(x) = \beta_{0} + \beta_{1}x + \beta_2x^2 \]

\(\beta_0\): intercept

\(\beta_{1}\): slope (when x is 0?)

\(\beta_2\): “deceleration” (?)


# R Code: 
gnls(model = y ~ b0 + b1*x + b2*x^2, 
     data = ag_exp, 
     params = b0 + b1 + b2~ 1,
     start = c(b0 = 1.0, b1 = 0.2, b2 = -0.015))

# Or, equivalently
lm(model = y ~ x + I(x^2), data = ag_exp) 

Polynomial Regression, Reparameterized

Equation:

\[ f(x) = \alpha_1 + \kappa(x - \alpha_0)^2 \]

\(\alpha_0\): optimum x value (i.e. x value which maximizes y)

\(\alpha_{1}\): maximum y value

\(\kappa\): related to “steepness” of the curve


# R Code: 
gnls(model = y ~ a1 + k1*(x - a0)^2, 
     data = ag_exp, 
     params = a0 + a1 + k ~ 1,
     start = c(a0 = 5, a1 = 1.5, k = -0.02))

Non-Linear Regression Models


Technical Definition: Functions with a first derivative with respect to at least one parameter which is a function of another parameter (takeaway is that its not actually about “curviness”, and some functions can be linear or nonlinear depending on the parameterization).


Plain(-ish) Language Summary: Functions which exhibit some combination of limiting behavior, such as asymptotically approaching an upper or lower limit, critical points, such as a maximum or minimum, and inflection points, where a trend switches from accelerating to decelerating.

Michaelis-Menton Function

Equation:

\[ f(x) = \frac{\alpha x}{\beta + x} \]

\(\alpha\): asymptote (y value as \(x \rightarrow \infty\))

\(\beta\): half-maximum (x value where y = \(\frac{\alpha}{2}\))


# R Code: 
gnls(model = y ~ a*x/(b+x), 
     data = ag_exp, 
     params = a + d ~ 1,
     start = c(a = 3, b = 1.7))

Michaelis-Menton Function

Target rate, \(T_p\), will be x value that achieves some percent, P, of the asymptote:

\[ T_p = \frac{\beta P}{1-P} \]

or, if prices, \(C_x\) and \(C_y\), are known for x and y, respectively, then you can calculate net returns (\(I_{net}\)):

\[ I_{net} = \frac{C_y\alpha x}{\beta + x} - xC_x \] \(\alpha\): asymptote (y value as \(x \rightarrow \infty\))

\(\beta\): half-maximum (x value where y = \(\frac{\alpha}{2}\))

Example 1: Onion Yield Response to Seeding Rate

First, run the following (after installing the required packages, as necessary) to set up our R environment:

# Load packages
library(agridat)
library(tidyverse)
library(nlme)
library(ggResidpanel)
library(emmeans)
library(marginaleffects)
library(car)

# set ggplot aesthetics
my_pal = c('steelblue', 'sienna', 'olivedrab', 'goldenrod')
my_theme = theme_classic()+
  theme(text = element_text(size = 12), 
        legend.position = 'bottom')

options(ggplot2.discrete.colour = my_pal, 
        ggplot2.discrete.fill = my_pal)
theme_set(my_theme)

Example 1: Onion Yield Response to Seeding Rate

This data is adapted from Schabenberger and Pierce (2001) and originally reported in Mead (1970). It contains the yield (\(\mathrm{kg/m^2}\)) of three onion (Allium cipa L.) cultivars when planted at various densities (\(\mathrm{plants/m^2}\)). Schabenberger and Pierce (2001) fit an alternative parameterization of the Michaelis-Menton function and referred to it by a different name.

# Load and prep data
onion = data.frame(
   Cultivar = factor(rep(LETTERS[1:3], each = 10)), 
   Density = c(33.045, 35.629, 64.261, 75.24, 93.323,
               144.129, 192.243, 232.178, 309.678, 334.542,
               23.035, 28.524, 40.903, 56.403, 84.281,
               93.861, 108.823, 173.084, 228.41, 276.74, 
               26.694, 37.997, 47.899, 67.059, 88.587,
               103.226, 181.587, 201.177, 277.063, 326.469), 
   Yield = c(3.49, 3.185, 4.562, 4.537, 4.442, 
             5.434, 5.825, 5.619, 6.441, 6.189,
             3.031, 3.112, 3.833, 4.072, 4.475, 
             4.665, 4.114, 5.764, 5.596, 5.064,
             3.118, 3.48, 3.482, 3.541, 4.323,
             4.036, 5.502, 4.868, 5.541, 5.321))

str(onion, width = 60, strict.width = 'cut')
'data.frame':   30 obs. of  3 variables:
 $ Cultivar: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1..
 $ Density : num  33 35.6 64.3 75.2 93.3 ...
 $ Yield   : num  3.49 3.18 4.56 4.54 4.44 ...

Exploratory Data Analysis (Ex. 1)

ggplot(onion, aes(x = Density, y = Yield, color = Cultivar))+
  # Plot the data itself
  facet_wrap(~Cultivar)+ 
  geom_point(size = 3)

Exploratory Data Analysis (Ex. 1)

ggplot(onion, aes(x = Density, y = Yield, color = Cultivar))+
  # Plot the data itself 
  facet_grid(cols = vars(Cultivar))+ 
  geom_point(size = 3) +
  # Plot linear trend
  geom_smooth(method = 'lm', se = F, size = .5)

Exploratory Data Analysis (Ex. 1)

# Michaelis-Menton function
micmen = function(x, ymax, xhalf){
  ymax*x/(xhalf + x)
}

# Considering just cultivar A
onion |> 
filter(Cultivar == 'A') |> 
ggplot(aes(x = Density, y = Yield, color = Cultivar))+
  # Plot the data itself 
  geom_point(size = 3) +
  # Plot linear trend
  geom_smooth(method = 'lm', se = F, size = .5) +
  # Plot Mitscherlich trend, using this 
  # to identify reasonable starting values
  geom_function(fun = micmen, 
                args = list(ymax = 7,
                            xhalf = 40), 
                size = 0.5, linetype = 2)

Model Development & Evaluation (Ex. 1)

First, fit the linear model (this could be done using lm).

# Fit the linear (ANCOVA) regression model
onion_mod_lin = gnls(model = Yield ~ yinit + slope*Density,
                    data = onion, 
                    params = yinit + slope ~ 0 + Cultivar, 
                    start = c(yinit = c(3.5, 3.5, 3.5), 
                              slope = c(0.01, 0.01, 0.01)))
print(onion_mod_lin)
Generalized nonlinear least squares fit
  Model: Yield ~ yinit + slope * Density 
  Data: onion 
  Log-likelihood: -16.46228

Coefficients:
yinit.CultivarA yinit.CultivarB yinit.CultivarC slope.CultivarA slope.CultivarB 
    3.579669886     3.375371187     3.229866885     0.009197382     0.008951270 
slope.CultivarC 
    0.008037759 

Degrees of freedom: 30 total; 24 residual
Residual standard error: 0.4683104 

Model Development & Evaluation (Ex. 1)

Next, fit the Mitscherlich model:

# Fit the Michaelis-Menton model
onion_mod_mm = gnls(model = Yield ~ ymax*Density/(xhalf + Density), 
                     data = onion, 
                     params = ymax + xhalf ~ 0 + Cultivar, 
                     start = c(curv = c(40, 40, 40),
                               ymax = c(7, 7, 7)))
print(onion_mod_mm)
Generalized nonlinear least squares fit
  Model: Yield ~ ymax * Density/(xhalf + Density) 
  Data: onion 
  Log-likelihood: -6.436848

Coefficients:
 ymax.CultivarA  ymax.CultivarB  ymax.CultivarC xhalf.CultivarA xhalf.CultivarB 
       6.863395        5.849601        5.847467       37.721908       23.875150 
xhalf.CultivarC 
      30.484627 

Degrees of freedom: 30 total; 24 residual
Residual standard error: 0.3352747 

Model Development & Evaluation (Ex. 1)

\(\Delta\)AIC strongly favors of the non-linear model. The diagnostic plots of the residuals indicate possible minor violations of the assumptions of normality and homoscedasticity, which in other circumstances I would probe into further, but that is beyond the scope of this workshop.

# compare the two models
AIC(onion_mod_lin, onion_mod_mm)
              df      AIC
onion_mod_lin  7 46.92457
onion_mod_mm   7 26.87370


onion_mod_res = resid(onion_mod_mm, type = 'normalized')
onion_mod_pred = fitted(onion_mod_mm)
resid_auxpanel(onion_mod_res, onion_mod_pred)

Estimation, Inference and Graphing (Ex. 1)

All the tools from the emmeans package will work with models fit with gnls, you simply need to specify the parameter of interest via the argument param:

# Perform F-Test
joint_tests(onion_mod_mm, param = 'ymax')
 model term df1 df2 F.ratio p.value
 Cultivar     2  23   4.316  0.0256


# Calculate estimated marginal means
(onion_emm_ymax = emmeans(onion_mod_mm, ~ Cultivar,
                         param = 'ymax'))
 Cultivar emmean    SE df lower.CL upper.CL
 A          6.86 0.287 23     6.27     7.46
 B          5.85 0.272 23     5.29     6.41
 C          5.85 0.274 23     5.28     6.41

Confidence level used: 0.95 


# Perform contrasts 
contrast(onion_emm_ymax, 'pairwise')
 contrast estimate    SE df t.ratio p.value
 A - B     1.01379 0.395 23   2.567  0.0438
 A - C     1.01593 0.396 23   2.563  0.0442
 B - C     0.00213 0.386 23   0.006  1.0000

P value adjustment: tukey method for comparing a family of 3 estimates 

Estimation, Inference and Graphing (Ex. 1)

For derived quantities such as \(T_{90\%}\), we can use the delta method to generate estimates, standard errors, and confidence intervals:

# names of coefficients used in formula for 
# derived quantities must match output from coef()
print(names(coef(onion_mod_mm)), width = 40)
[1] "ymax.CultivarA"  "ymax.CultivarB" 
[3] "ymax.CultivarC"  "xhalf.CultivarA"
[5] "xhalf.CultivarB" "xhalf.CultivarC"


# Calculate 90% target rate for each cultivar, and 
# output the results
t90a = deltaMethod(object = onion_mod_mm, 
                   g. = 'xhalf.CultivarA*0.9/(1 - 0.9)', 
                   func = 'T90% for Cultivar A:')

t90b = deltaMethod(object = onion_mod_mm, 
                   g. = 'xhalf.CultivarB*0.9/(1 - 0.9)', 
                   func = 'T90% for Cultivar B:')

t90c = deltaMethod(object = onion_mod_mm, 
                   g. = 'xhalf.CultivarC*0.9/(1 - 0.9)', 
                   func = 'T90% for Cultivar C:')
rbind(t90a, t90b, t90c)
                     Estimate      SE   2.5 % 97.5 %
T90% for Cultivar A:  339.497  56.198 229.352 449.64
T90% for Cultivar B:  214.876  42.219 132.129 297.62
T90% for Cultivar C:  274.362  52.767 170.940 377.78

Estimation, Inference and Graphing (Ex. 1)

The marginaleffects package has additional tools for generating estimates and confidence band for the fitted trendline:

# Generate a new data set containing all combinations of 
# cultivar levels and a fine grid of density values
toplot = expand.grid(Cultivar = factor(LETTERS[1:3]),
                     Density = 0:33*10) 

# Generate predictions for these new cultivar and density
# combinations
onion_preds = avg_predictions(model = onion_mod_mm, 
                             newdata = toplot, 
                             by = c('Cultivar', 'Density'))

# Plot the results, along with the original data
ggplot(onion_preds, aes(x = Density, color = Cultivar, fill = Cultivar)) +
  facet_grid(cols = vars(Cultivar))+
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
              color = NA, alpha = 0.2) +
  geom_line(aes(y = estimate))+
  geom_point(aes(y = Yield), data = onion, size = 2)

Estimation, Inference and Graphing (Ex. 1)

Log-Logistic Function

Equation:

\[ f(x) = \alpha_0 + \frac{\alpha_1 - \alpha_0}{1+e^{-\beta(\mathrm{ln}(x)-\mathrm{ln}(EC_{50}))}} \]

\(\alpha_0\) is the intercept

\(\alpha_1\) is the lower limit

\(\beta\) is the “slope”

\(EC_{50}\) is half-maximal effective concentration


# R Code:
gnls(y ~ a0 + (a1 - a0)/(1+exp(-b*(log(x) - log(EC50)))),  
     data = ag_exp, 
     params = a0 + a1 + b + IC50 ~ 1, 
     start = c(a0 = 1, a1 = 0.2, b = 1, EC50 = 3))

Exponential Decay Function

Equation:

\[ f(x) = \alpha_1 + (\alpha_0 - \alpha_1)e^{-\frac{x}{\beta}} \]

\(\alpha_0\) is the intercept

\(\alpha_1\) is the lower limit

\(\beta\) is the “slope”


# R Code:
gnls(y ~ a1+(a0-a1)*exp(-x/b),  
     data = ag_exp, 
     params = a0 + a1 + b ~ 1, 
     start = c(a0 = 1, a1 = 0.2, b = 4.5))

Hyperbolic Function

Equation:

\[ f(x) = \alpha_1 + \frac{\beta*(\alpha_0 - \alpha_1)}{(\beta+x)} \]

\(\alpha_0\) is the intercept

\(\alpha_1\) is the lower limit

\(\beta\) is the “slope”


# R Code:
gnls(y ~ a1 + b*(a0-a1)/(b+x),  
     data = ag_exp, 
     params = a0 + a1 + b ~ 1, 
     start = c(a0 = 1, a1 = 0.2, b = 1, b = 2))

Example 2: Herbicide Resistence in Pigweed

The “pigweed” experiment - originally from Ferguson, Hamill, and Tardif (2001) and republished by Bowley (2015) - examined the response (as percent survival) exhibited by two biotypes of Amaranthus powellii S. Watson when exposed to increasing doses of the herbicide imazethapyr.

pigweed = data.frame(
   Biotype = factor(rep(LETTERS[1:2], each = 36)),
   Rate = c(rep(c(1, 4, 8, 16, 32,
                  64, 128, 256, 512),
                each = 4), 
            rep(c(0.001, 0.063, 0.125, 0.25,
                  0.5, 1, 2, 4, 16), 
                each = 4)), 
   Block = factor(rep(c(1:4), 18)), 
   Pct_Survival = c(100, 100, 100, 100, 100, NA, 
                     89.44, 61.53, 97.7, 100, 81.74,
                     61.04, 57.47, 100, 82.42, 73.71,
                     79.31, NA, 52.82, 69.51, 66.67,
                     65.44, 55.38, 51.56, 83.91,
                     76.5, 43.85, 42.18, 68.97, NA,
                     22.44, 34.62, 51.72, 60.14,
                     14.54, 15.85, NA, 100, 100, 100,
                     87.5, 74.3, 97.37, 42.87, 38.28,
                     50.37, 31.21, 27.17, 42.19, 38.68,
                     44.3, 45.87, 35.16, 19.65, 28.4,
                     27.35, 12.5, 14.31, 17.57, 39.44, 
                     7.81, 32.88, 17.63, 22.87, 8.59,
                     21.87, 19.36, 24.06, 11.72, 3.84, 
                     12.95, 19.89))

str(pigweed, width = 60, strict.width = 'cut')
'data.frame':   72 obs. of  4 variables:
 $ Biotype     : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1..
 $ Rate        : num  1 1 1 1 4 4 4 4 8 8 ...
 $ Block       : Factor w/ 4 levels "1","2","3","4": 1 2 3..
 $ Pct_Survival: num  100 100 100 100 100 ...

Exploratory Data Analysis (Ex. 2)

ggplot(pigweed, aes(x = (Rate), y = Pct_Survival, color = Biotype))+
  # Plot the data itself
  facet_grid(cols = vars(Biotype), scales = 'free')+
  geom_point(size = 2) 

Exploratory Data Analysis (Ex. 2)

# Define Log-Logistic function
loglogistic = function(x, ymax, ymin, slope, IC50){
  ymin + (ymax-ymin)/(1+exp(-b*(log(x)-log(IC50))))
}

ggplot(pigweed, aes(x = (Rate), y = Pct_Survival, color = Biotype))+
  # Plot the data itself
  facet_grid(cols = vars(Biotype), scales = 'free')+
  geom_point() +
  # Plot the log-logistic function
  geom_function(fun = loglogistic, 
                args = list(ymin = 30, ymax = 100, 
                            IC50 = 200, slope = -50))

Model Development & Evaluation (Ex. 2)

Fit the log-logistic model:

pigweed_mod_ll = gnls(Pct_Survival ~ ymin + (ymax - ymin)/(1+ exp(slope*(log(Rate) - log(EC50)))),
                      data = pigweed,
                      params = list(ymin + ymax ~ 1, 
                                    slope + EC50 ~ Biotype - 1),
                      start = list(ymin = 0, ymax = 100, 
                                   slope = c(.5, .5), 
                                   EC50 = c(1, 0.1)),
                      na.action = na.omit)
print(pigweed_mod_ll)
Generalized nonlinear least squares fit
  Model: Pct_Survival ~ ymin + (ymax - ymin)/(1 + exp(slope * (log(Rate) -      log(EC50)))) 
  Data: pigweed 
  Log-likelihood: -273.9753

Coefficients:
          ymin           ymax slope.BiotypeA slope.BiotypeB  EC50.BiotypeA 
   16.38695930   102.71638822     0.57501493     1.19270649    76.73484213 
 EC50.BiotypeB 
    0.08745889 

Degrees of freedom: 68 total; 62 residual
Residual standard error: 14.24344 

Model Development & Evaluation (Ex. 2)

Fit the hyperbolic model:

pigweed_mod_hy = gnls(Pct_Survival ~ ymin + slope*(y0 - ymin)/(slope+Rate),
                         data = pigweed,
                         params = list(ymin + y0 + slope ~ 0+Biotype),
                         start = list(ymin = c(20, 7), 
                                      y0 = c(100, 100), 
                                      slope = c(20, 0.25)),
                         na.action = na.omit)
print(pigweed_mod_hy)
Generalized nonlinear least squares fit
  Model: Pct_Survival ~ ymin + slope * (y0 - ymin)/(slope + Rate) 
  Data: pigweed 
  Log-likelihood: -274.9822

Coefficients:
 ymin.BiotypeA  ymin.BiotypeB    y0.BiotypeA    y0.BiotypeB slope.BiotypeA 
   34.21815082    14.56277604    94.52104250   102.68822396    50.30093630 
slope.BiotypeB 
    0.08757621 

Degrees of freedom: 68 total; 62 residual
Residual standard error: 14.45592 

Model Development & Evaluation (Ex. 2)

Fit the negative exponential model:

pigweed_mod_ne = gnls(Pct_Survival ~ ymin + (yinit - ymin)*exp(-Rate/slope),
                      data = pigweed,
                      params = list(yinit +  ymin + slope ~ 0+Biotype),
                      start = list(ymin = c(35, 15), 
                                   yinit = c(100, 100), 
                                   slope = c(70, 0.25)),
                      na.action = na.omit)
print(pigweed_mod_ne)
Generalized nonlinear least squares fit
  Model: Pct_Survival ~ ymin + (yinit - ymin) * exp(-Rate/slope) 
  Data: pigweed 
  Log-likelihood: -277.1073

Coefficients:
yinit.BiotypeA yinit.BiotypeB  ymin.BiotypeA  ymin.BiotypeB slope.BiotypeA 
    90.2168592    100.2895962     38.8751801     20.0059165     89.7827055 
slope.BiotypeB 
     0.1316581 

Degrees of freedom: 68 total; 62 residual
Residual standard error: 14.91482 

Model Development & Evaluation (Ex. 2)

AIC suggests that the log-logistic is the best fitting model, while visual check of the residuals does not suggest there are violations of the assumptions of normality and homoscedasticity.

AIC(pigweed_mod_ll, pigweed_mod_hy, pigweed_mod_ne)
               df      AIC
pigweed_mod_ll  7 561.9505
pigweed_mod_hy  7 563.9644
pigweed_mod_ne  7 568.2146


pigweed_res = resid(pigweed_mod_ll, type = 'normalized')
pigweed_pred = fitted(pigweed_mod_ll)
resid_auxpanel(pigweed_res, pigweed_pred)

Estimation, Inference and Graphing (Ex. 2)

All the tools from the emmeans package will work with models fit with gnls, you simply need to specify the parameter of interest via the argument param:

# Perform F-Test
joint_tests(pigweed_mod_ll, param = 'EC50')
 model term df1 df2 F.ratio p.value
 Biotype      1  61   4.702  0.0340


# Calculate estimated marginal means
(pigweed_emm_ll = emmeans(pigweed_mod_ll, ~ Biotype,
                          param = 'EC50'))
 Biotype  emmean      SE df lower.CL upper.CL
 A       76.7348 35.3590 61   6.0302  147.439
 B        0.0875  0.0224 61   0.0427    0.132

Confidence level used: 0.95 


# Perform contrasts, although with only
# two levels it is unneeded
contrast(pigweed_emm_ll, 'pairwise')
 contrast estimate   SE df t.ratio p.value
 A - B        76.6 35.3 61   2.168  0.0340

Estimation, Inference and Graphing (Ex. 2)

The marginaleffects package has additional tools for generating estimates and confidence band for the fitted trendline:

# Generate a new data set containing all combinations of 
# Biotype and fine grid of Rate values. Note to remove 
# rows with biotype B and high rates or it messes up
# plots later
toplot = expand.grid(Biotype = factor(LETTERS[1:2]),
                     Rate = exp(seq(-6.9, 6.3, .1))) |> 
  filter(!(Biotype == 'B' & Rate > 17))

# Generate predictions for these new cultivar and density
# combinations
pigweed_preds = avg_predictions(model = pigweed_mod_ll, 
                                newdata = toplot, 
                                by = c('Biotype', 'Rate')) 

# Plot the results, along with the original data
ggplot(pigweed_preds, aes(x = Rate, color = Biotype, fill = Biotype)) +
  facet_grid(cols = vars(Biotype), scales = 'free')+
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
              color = NA, alpha = 0.2) +
  geom_line(aes(y = estimate))+
  geom_point(aes(y = Pct_Survival), data = pigweed, size = 2)

Estimation, Inference and Graphing (Ex. 2)

Additional Resources

Chapter 3: Deterministic Functions for Ecological Modeling provides an excellent, software agnostic introduction to nonlinear models

This is the definative reference for the nlme package. Some chapters are highly mathematical, others very much applied. Most examples are not agriculture related, though.

Chapter 15: Nonlinear Regression Models and Applications describes and details a large number the most commonly used nonlinear models in agricultural and related disciplines, and provides example code in R

Extended treatment of both mathematical/theorectical and practical aspects of nonlinear modeling in plant and soil sciences, but example code is almost all in SAS

References

Bowley, Stephen R. 2015. Hitchhiker’s Guide to Statistics in Biology: Generalized Linear Mixed Model Edition. 1st ed. Kincardine, ON, Canada: Plants et. al.
Ferguson, Gabrielle M., Allan S. Hamill, and François J. Tardif. 2001. ALS Inhibitor Resistance in Populations of Powell Amaranth and Redroot Pigweed.” Weed Science 49 (4): 448–53. https://doi.org/10.1614/0043-1745(2001)049[0448:AIRIPO]2.0.CO;2.
Mead, R. 1970. “Plant Density and Crop Yield.” Journal of the Royal Statistical Society Series C: Applied Statistics 19 (1): 64–81. https://doi.org/10.2307/2346843.
Schabenberger, Oliver, and Francis J. Pierce. 2001. Contemporary Statistical Models for the Plant and Soil Sciences. CRC Press. https://books.google.com?id=OyLNBQAAQBAJ.